[Pipelines] Add DreamLite text-to-image and image-edit pipelines by Carlofkl · Pull Request #13815 · huggingface/diffusers

Carlofkl · 2026-05-27T06:16:54Z

Context

This PR integrates DreamLite — ByteDance's text-to-image / image-edit diffusion model — into diffusers, following an invitation from @NielsRogge to release the model on the Hub in diffusers format.

Related issue: ByteVisionLab/DreamLite#3 (comment)

Model cards (public, ungated):

Base (3-branch dual CFG): https://huggingface.co/carlofkl/DreamLite-base
Mobile (distilled, single forward): https://huggingface.co/carlofkl/DreamLite-mobile

Both repos use a diffusers branch (loaded via revision="diffusers") to keep the original ByteDance-internal main branch intact for backward compatibility with existing users.

What's added

src/diffusers/
├── models/unets/unet_dreamlite.py            # DreamLiteUNetModel
├── pipelines/dreamlite/
│   ├── __init__.py
│   ├── pipeline_dreamlite.py                  # DreamLitePipeline (3-branch dual CFG)
│   ├── pipeline_dreamlite_mobile.py           # DreamLiteMobilePipeline (distilled)
│   └── pipeline_output.py
└── (registered in src/diffusers/__init__.py, models/__init__.py,
    pipelines/__init__.py, utils/dummy_*.py)

docs/source/en/api/pipelines/dreamlite.md
tests/pipelines/dreamlite/
├── test_pipeline_dreamlite.py
└── test_pipeline_dreamlite_mobile.py

Architecture highlights

DreamLiteUNetModel — UNet-based denoiser conditioned on Qwen3-VL text/vision embeddings.
DreamLitePipeline — runs 3 forward passes per step (text-cond / image-cond / uncond) and combines them with a dual-CFG schedule for high-fidelity text-to-image and image edit.
DreamLiteMobilePipeline — distilled single-pass variant; no CFG; designed for on-device inference. Pairs with AutoencoderTiny.
Both pipelines use FlowMatchEulerDiscreteScheduler.

Testing

Loading smoke test against carlofkl/DreamLite-base with revision="diffusers" — all 6 sub-modules resolve to the correct diffusers.* namespace.
Inference smoke test — generates a 1024×1024 image in ~0.6s/step on a single A800; output stats sane (std≈93, no NaN/Inf).
Standard pipeline tests in tests/pipelines/dreamlite/.

Before submitting

Did you read the contributor guideline?
Did you read our philosophy doc?
Was this discussed/approved via a GitHub issue or the forum? Please add a link to it if that's the case. → Release DreamLite on Hugging Face ByteVisionLab/DreamLite#3
Did you make sure to update the documentation with your changes? Here are the documentation guidelines.
Did you write any new necessary tests?

Who can review?

cc @sayakpaul @yiyixuxu @DN6 — thanks in advance for the review!

Add ByteDance's DreamLite model family to diffusers. DreamLite is a UNet-based diffusion model that supports both text-to-image generation and reference-image editing through a shared 3-branch dual-CFG design. Two pipelines are shipped: * DreamLitePipeline - full 3-branch dual CFG (negative, reference, prompt); supports T2I and I2I editing at 1024x1024. * DreamLiteMobilePipeline - distilled single-branch variant for on-device inference; no CFG. New model code (all isolated under *_dreamlite.py / unet_dreamlite.py to avoid touching shared upstream files): * models/transformers/transformer_2d_dreamlite.py - DreamLite 2D transformer block. * models/unets/unet_dreamlite.py - DreamLiteUNetModel. * models/unets/unet_2d_blocks_dreamlite.py - DreamLite-specific down/up/mid blocks. * models/resnet_dreamlite.py - DreamLite ResNet variants. * models/attention_processor.py - add DreamLiteAttnProcessor2_0 (pure addition, no existing processor modified). Pipeline + tests + docs: * pipelines/dreamlite/{__init__.py, pipeline_dreamlite.py, pipeline_dreamlite_mobile.py, pipeline_output.py}. * tests/pipelines/dreamlite/{test_pipeline_dreamlite.py, test_pipeline_dreamlite_mobile.py} with the standard PipelineTesterMixin suite; setUp/tearDown auto-patches encode_prompt with a fake so MagicMock text encoders work without per-test boilerplate. * Skip 8 mixin tests that don't apply to DreamLite (MagicMock serialisation, custom attention processor, encode_prompt return shape, batch_size > 1 sweep), mirroring SD3 / Flux conventions. * docs/source/en/api/pipelines/dreamlite.md + _toctree.yml entry (alphabetically between DiT and EasyAnimate). * Register exports in 6 __init__.py files. Two real bugs surfaced by the mixin test suite are fixed in this commit: * num_images_per_prompt > 1: prompt_embeds and text_attention_mask are now repeated along the batch dimension in both pipelines' T2I and I2I branches before being passed to the UNet. * vae=None: __init__ now guards the encoder_block_out_channels lookup so encode_prompt can be tested in isolation per PipelineTesterMixin convention. SlowTests real-checkpoint resolution is set to 1024x1024 (the only size DreamLite is trained for). Test result: 27 passed, 50 skipped, 0 failed on CPU fast suite. make style && make quality: clean.

The `carlofkl/DreamLite-{base,mobile}` Hub repos host two flavours of the same checkpoint: * `main` branch - keeps `model_index.json` pointing at ByteDance's internal package path so the original (non-diffusers) reference code can still load these weights. * `diffusers` branch - rewrites the `unet` entry of `model_index.json` to `["diffusers", "DreamLiteUNetModel"]` so this integration loads correctly from `diffusers`. This commit pins every `from_pretrained(...)` call shipped with the diffusers integration (docs examples, pipeline docstrings, SlowTests) to `revision="diffusers"`. Local-override env vars (DREAMLITE_BASE_PATH / DREAMLITE_MOBILE_PATH) still bypass the revision pin.

…ts after rebase Mechanical changes after rebasing onto current `main`: * `pipeline_dreamlite.py::retrieve_timesteps` — re-synced from `diffusers.pipelines.flux.pipeline_flux.retrieve_timesteps` (PEP 604 type hints, expanded docstring, plus the new `accepts_timesteps` / `accept_sigmas` introspection guards). DreamLite's default code path uses `num_inference_steps` (uniform schedule) and never passes custom `timesteps` / `sigmas`, so the added guards are dead-code for this pipeline — behaviour is unchanged. * `dummy_pt_objects.py` / `dummy_torch_and_transformers_objects.py` — registered the dummy classes auto-generated by `make fix-copies` for `DreamLiteTransformer2DModel`, `DreamLiteUNetModel`, `DreamLitePipeline`, `DreamLiteMobilePipeline`, `DreamLitePipelineOutput`. Generated by `make fix-copies`. No hand edits.

HuggingFaceDocBuilderDev · 2026-05-27T23:22:11Z

The docs for this PR live here. All of your documentation changes will be reflected on that endpoint. The docs are available until 30 days after the last update.

…ing entries - Register DreamLiteAttnProcessor2_0 in docs/source/en/api/attnprocessor.md (fixes check_support_list.py). - Split combined 'height / width' and 'guidance_scale / image_guidance_scale' entries in the two pipeline docstrings; add a complete Args block to DreamLiteTransformer2DModel.forward (fixes check_forward_call_docstrings.py). No behavioral change.

Carlofkl · 2026-05-28T04:40:30Z

Hi @sayakpaul @yiyixuxu — pushed a small follow-up commit (032412c) that fixes the two check_repository_consistency failures from the previous run:

Registered DreamLiteAttnProcessor2_0 in docs/source/en/api/attnprocessor.md (was missing — check_support_list.py).
Split combined height / width and guidance_scale / image_guidance_scale docstring entries in both pipelines into separate lines, and added a complete Args block to DreamLiteTransformer2DModel.forward (was tripping check_forward_call_docstrings.py).

No behavioral change — docs/docstrings only. Verified both lints pass locally.

Whenever convenient, could you re-approve the workflows? Thanks!

Carlofkl · 2026-06-02T08:33:11Z

Hi @yiyixuxu @DN6 @sayakpaul — quick update: CI is now fully green
(all 15 checks passing) and I've just marked the PR as ready for review.
Whenever you have a moment to take a look, I'd really appreciate it. Thanks!

dg845 · 2026-06-03T08:31:21Z

+# ---------------------------------------------------------------------------
+# Down blocks
+# ---------------------------------------------------------------------------
+def _make_down_block_class(class_name: str, *, remove_self_attn: bool):


Rather than having _make_down_block_class, I think we should define the down block classes directly:

# Version with both self attention and cross attention class DreamLiteAttnDownBlock2D(nn.Module): ... # Version with only cross attention class DreamLiteCrossAttnDownBlock2D(nn.Module): ...

since this would be more clear, at the cost of some duplicated code. (I also think the current names are hard to follow, especially CrossAttnUpRemoveSelfAttnBlock2DV1DreamLite.)

dg845 · 2026-06-03T08:32:18Z

+# ---------------------------------------------------------------------------
+# Up blocks
+# ---------------------------------------------------------------------------
+def _make_up_block_class(class_name: str, *, remove_self_attn: bool):


Analogous comment to #13815 (comment): I think it would be better to define the two attention up block classes directly.

dg845 · 2026-06-03T08:33:39Z

+# ---------------------------------------------------------------------------
+# Plain resnet-only blocks (no attention)
+# ---------------------------------------------------------------------------
+class DownBlock2DDreamLite(nn.Module):


Suggested change

class DownBlock2DDreamLite(nn.Module):

class DreamLiteDownBlock2D(nn.Module):

nit: I think the above suggestion better follows the current diffusers model naming patterns.

dg845 · 2026-06-03T08:34:14Z

+        return hidden_states, output_states
+
+
+class UpBlock2DDreamLite(nn.Module):


Suggested change

class UpBlock2DDreamLite(nn.Module):

class DreamLiteUpBlock2D(nn.Module):

Similar comment to #13815 (comment).

dg845 · 2026-06-03T08:36:12Z

+from ..attention_processor import Attention, DreamLiteAttnProcessor2_0
+from ..normalization import RMSNorm
+from .unet_2d_blocks_dreamlite import (
+    CrossAttnDownBlock2DDreamLite,
+    CrossAttnDownRemoveSelfAttnBlock2DDreamLite,
+    CrossAttnUpBlock2DDreamLite,
+    CrossAttnUpRemoveSelfAttnBlock2DV1DreamLite,
+    DownBlock2DDreamLite,
+    UNetMidBlock2DCrossAttnDreamLite,
+    UpBlock2DDreamLite,
+)


I think implementing all of the DreamLite model blocks in a single file (like how recent transformer models are implemented) would better follow the current model design. CC @yiyixuxu

dg845 · 2026-06-03T08:38:42Z

+        device: torch.device,
+        dtype: torch.dtype,
+        image: Optional[Image.Image] = None,
+        max_sequence_length: int = 500,


It looks like max_sequence_length is currently unused in encode_prompt, is this intentional?

dg845 · 2026-06-03T08:39:42Z

+        text_pad_embedding: Optional[torch.Tensor] = None,
+    ):
+        if mode == "edit":
+            drop_idx = 64


Can we document what drop_idx means here? Would it be possible to get this value from e.g. self.processor instead of hardcoding it?

dg845 · 2026-06-03T08:40:34Z

+            )
+
+        elif mode == "generate":
+            drop_idx = 34


Analogous comment to #13815 (comment).

dg845 · 2026-06-03T08:48:36Z

+            if num_images_per_prompt > 1:
+                prompt_embeds = prompt_embeds.repeat_interleave(num_images_per_prompt, dim=0)
+                text_attention_mask = text_attention_mask.repeat_interleave(num_images_per_prompt, dim=0)
+            image_processed = self.image_processor.preprocess(image.resize((width, height), Image.Resampling.LANCZOS))


Suggested change

image_processed = self.image_processor.preprocess(image.resize((width, height), Image.Resampling.LANCZOS))

image_processed = self.image_processor.preprocess(image, height=height, width=width)

Would the above suggestion work? VaeImageProcessor's default resample value is "lanczos", so I think we should be able to call preprocess normally instead of manually resizing the image first.

dg845 · 2026-06-03T08:49:41Z

+                noise_pred = noise_pred[..., : latents.shape[-1]]
+                if task == "generate":
+                    noise_pred_uncond, noise_pred_cond = noise_pred.chunk(2)
+                    noise_pred = noise_pred_uncond + self._guidance_scale * (noise_pred_cond - noise_pred_uncond)


Suggested change

noise_pred = noise_pred_uncond + self._guidance_scale * (noise_pred_cond - noise_pred_uncond)

noise_pred = noise_pred_uncond + self.guidance_scale * (noise_pred_cond - noise_pred_uncond)

nit: I think we should use the guidance_scale property here.

dg845 · 2026-06-03T08:50:23Z

+                        + self._guidance_scale * (noise_pred_text - noise_pred_image)
+                        + self._image_guidance_scale * (noise_pred_image - noise_pred_uncond)
+                    )
+


Suggested change

noise_pred = (

noise_pred_uncond

+ self.guidance_scale * (noise_pred_text - noise_pred_image)

+ self.image_guidance_scale * (noise_pred_image - noise_pred_uncond)

)

nit: similar comment to #13815 (comment).

dg845 · 2026-06-03T08:53:25Z

+from ..stable_diffusion.pipeline_stable_diffusion_img2img import retrieve_latents
+from .pipeline_dreamlite import calculate_shift, retrieve_timesteps


Suggested change

from ..stable_diffusion.pipeline_stable_diffusion_img2img import retrieve_latents

from .pipeline_dreamlite import calculate_shift, retrieve_timesteps

We prefer to copy helper functions like these and use the # Copied from mechanism to sync the implementations, similar to how retrieve_timesteps is implemented in src/diffusers/pipelines/dreamlite/pipeline_dreamlite.py.

dg845 · 2026-06-03T08:56:20Z

+    @staticmethod
+    def _extract_masked_hidden(hidden_states: torch.Tensor, mask: torch.Tensor) -> List[torch.Tensor]:


Suggested change

@staticmethod

def _extract_masked_hidden(hidden_states: torch.Tensor, mask: torch.Tensor) -> List[torch.Tensor]:

@staticmethod

# Copied from diffusers.pipelines.dreamlite.pipeline_dreamlite.DreamLitePipeline._extract_masked_hidden

def _extract_masked_hidden(hidden_states: torch.Tensor, mask: torch.Tensor) -> List[torch.Tensor]:

We should use the # Copied from mechanism for all copied helper methods so that the implementations are synced.

dg845

Thanks for the PR! Left an initial design review :).

dg845 · 2026-06-03T09:11:50Z

Also, if I test out the example using the following script:

import torch
from diffusers import DreamLitePipeline
from diffusers.utils import load_image

model_id = "carlofkl/DreamLite-base"
device = "cuda"
dtype = torch.float16

pipe = DreamLitePipeline.from_pretrained(model_id, revision="diffusers", torch_dtype=dtype)
pipe.to(device=device)

# Text-to-image
image = pipe(
    prompt="A serene mountain lake at sunrise",
    generator=torch.Generator(device=device).manual_seed(42),
).images[0]

image.save("dreamlite_t2i.png")

# Image-to-image (instruction-based edit)
image_url = "https://huggingface.co/datasets/huggingface/documentation-images/resolve/main/diffusers/astronaut.jpg"
init_image = load_image(image_url)
edited = pipe(
    prompt="make it snowy",
    image=init_image,
    generator=torch.Generator(device=device).manual_seed(42),
).images[0]

edited.save("dreamlite_i2i.png")

I get the following T2I sample:

and the following I2I sample:

Is the sample quality expected? The T2I image in particular has a weird block pattern.

Carlofkl added 3 commits May 27, 2026 11:38

github-actions Bot added size/L PR with diff > 200 LOC documentation Improvements or additions to documentation models tests utils pipelines and removed size/L PR with diff > 200 LOC labels May 27, 2026

Carlofkl mentioned this pull request May 27, 2026

Release DreamLite on Hugging Face ByteVisionLab/DreamLite#3

Open

Merge branch 'main' into feature/dreamlite-integration

0b7d747

github-actions Bot added the size/L PR with diff > 200 LOC label May 27, 2026

Carlofkl marked this pull request as ready for review May 31, 2026 14:18

Merge branch 'main' into feature/dreamlite-integration

7d9bd46

sayakpaul requested review from dg845 and yiyixuxu June 2, 2026 08:34

dg845 reviewed Jun 3, 2026

View reviewed changes

	class DownBlock2DDreamLite(nn.Module):
	class DreamLiteDownBlock2D(nn.Module):

		return hidden_states, output_states


		class UpBlock2DDreamLite(nn.Module):

	class UpBlock2DDreamLite(nn.Module):
	class DreamLiteUpBlock2D(nn.Module):

	image_processed = self.image_processor.preprocess(image.resize((width, height), Image.Resampling.LANCZOS))
	image_processed = self.image_processor.preprocess(image, height=height, width=width)

	noise_pred = noise_pred_uncond + self._guidance_scale * (noise_pred_cond - noise_pred_uncond)
	noise_pred = noise_pred_uncond + self.guidance_scale * (noise_pred_cond - noise_pred_uncond)

+                    noise_pred = (
+                        noise_pred_uncond
+                        + self.guidance_scale * (noise_pred_text - noise_pred_image)
+                        + self.image_guidance_scale * (noise_pred_image - noise_pred_uncond)
+                    )

		from ..stable_diffusion.pipeline_stable_diffusion_img2img import retrieve_latents
		from .pipeline_dreamlite import calculate_shift, retrieve_timesteps

		@staticmethod
		def _extract_masked_hidden(hidden_states: torch.Tensor, mask: torch.Tensor) -> List[torch.Tensor]:

Conversation

Carlofkl commented May 27, 2026

Context

What's added

Architecture highlights

Testing

Before submitting

Who can review?

Uh oh!

HuggingFaceDocBuilderDev commented May 27, 2026

Uh oh!

Carlofkl commented May 28, 2026

Uh oh!

Carlofkl commented Jun 2, 2026

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

dg845 left a comment

Choose a reason for hiding this comment

Uh oh!

dg845 commented Jun 3, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

5 participants